Linear Manifold Correlation Clustering

نویسندگان

  • Rave HARPAZ
  • Robert HARALICK
چکیده

The detection of correlations is a data mining task of increasing importance due to new areas of application such as DNA microarray analysis, collaborative filtering, and text mining. In these cases object similarity is no longer measured by physical distance, but rather by the behavior patterns objects manifest or the magnitude of correlations they induce. Many approaches have been proposed to identify clusters complying with this requirement. However, most approaches assume specific cluster models, which in turn may lead to biased results. In this paper we present a novel methodology based on linear manifolds which provides a more general and flexible framework by which correlation clustering can be done. We discuss two stochastic linear manifold cluster models and demonstrate their applicability to a wide range of correlation clustering situations. The general model provides the ability to capture arbitrarily complex linear dependencies or correlations. The specialized model focuses on simpler forms of linear dependencies, yet generalizes the dependencies often sought by the so called “pattern” clustering methods. Based on these models we discuss two linear manifold clustering algorithms, the later a fine-tuned derivative of the first targeting simpler forms of correlation and “pattern” clusters. The efficacy of our methods is demonstrated by a series of experiments on real data from the microarray and collaborative filtering domains. One of the experiments demonstrates that our method is able to identify statistically significant correlation clusters that are overlooked by existing methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear manifold clustering in high dimensional spaces by stochastic search

Classical clustering algorithms are based on the concept that a cluster center is a single point. Clusters which are not compact around a single point are not candidates for classical clustering approaches. In this paper we present a new clustering paradigm in which the cluster center is a linear manifold. Clusters are groups of points compact around a linear manifold. A linear manifold of dime...

متن کامل

A manifold based clustering algorithm and application to object discovery in RGBD data

Traditional clustering algorithms like k means clustering assumes that the data points that are to be to be clustered are distributed as spherical blobs. Hence, they fail when the data to be clustered comes from multiple underlying manifolds. The situation becomes more complex, when the constituting manifolds intersects each other. In this thesis we introduce a new manifold based clustering alg...

متن کامل

Hierarchical Document Clustering Using Correlation Preserving Indexing

This paper presents a spectral clustering method called as correlation preserving indexing (CPI). This method is performed in the correlation similarity measure space. Correlation preserving indexing explicitly considers the manifold structure embedded in the similarities between the documents. The aim of CPI method is to find an optimal semantic subspace by maximizing the correlation between t...

متن کامل

Unsupervised Shape Clustering using Diffusion Maps

The quotient space of all smooth and connected curves represented by a fixed number of boundary points is a finite-dimensional Riemannian manifold, also known as a shape manifold. This makes the preservation of locality a critically important issue when reducing the dimensionality of shapes on the manifold. We present a completely unsupervised clustering algorithm employing diffusion maps for l...

متن کامل

Manifold Learning and Dimensionality Reduction with Diffusion Maps

This report gives an introduction to diffusion maps, some of their underlying theory, as well as their applications in spectral clustering. First, the shortcomings of linear methods such as PCA are shown to motivate the use of graph-based methods. We then explain Locally Linear Embedding [9], Isomap [11] and Laplacian eigenmaps [1], before we give details on diffusion maps and anisotropic diffu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007